docs: update usage and resources #886

TC-MO · 2024-03-18T12:54:34Z

rewrites for clarity
standarized references
removed blockquotes
added docusaurus style admonitions

rewrites for clarity standarized references removed blockquotes added docusaurus style admonitions

valekjo

Added some comments :)

Maybe it would be worth to split the page to two?

Usage & resources
Optimization & tips?

sources/platform/actors/running/usage_and_resources.md

valekjo · 2024-03-21T09:14:54Z

sources/platform/actors/running/usage_and_resources.md


 ## Requirements

-Actors built on top of the [Apify JS SDK](/sdk/js) and [Crawlee](https://crawlee.dev/) use autoscaling. This means that they will always run as efficiently as they can with the memory they have allocated. So, if you allocate 2 times more memory, the run should be 2 times faster and consume the same amount of compute units (1 * 1 = 0.5 * 2). Autoscaling for Python is not yet available, but it is planned for the near future.
+Actors built with [Apify JS SDK](/sdk/js) and [Crawlee](https://crawlee.dev/) use autoscaling. This means that they will always run as efficiently as they can based on the allocated memory. So, if you double the allocated memory, the run should be twice as fast and consume the same amount of compute units (1 * 1 = 0.5 * 2).


This should be checked with someone from delivery / tooling. Not sure if this is 100% true, as it contradicts a bit the cheerio section later (with mention of node threads)

@B4nan could we get some input here and in other issues mentioned by @valekjo

it feels correct. 4g memory = 1cpu core, so the note about max memory of 4g for cheerio makes sense, the node process will only use a single thread there (sidenote: that's how we have it implemented, we could try to leverage worker threads or other similar parallelization features of node to get around this).

cc @metalwarrior665 for field experience :]

Yes, 4 GB is a target for non-browser Node.js actors because of the single core restriction.

Also, Apify SDK doesn't use any autoscaling, there is nothing to scale there, that is only related to Crawlee.

So, as I understand, mention of Apify JS SDK should be removed from this paragraph, as it is only regarding Crawlee?

sources/platform/actors/running/usage_and_resources.md

valekjo · 2024-03-21T09:16:23Z

sources/platform/actors/running/usage_and_resources.md

+- Actors using [Puppeteer](https://pptr.dev/) or [Playwright](https://playwright.dev/) for real web browser rendering require at least `1024MB` of memory.
+- Large and complex sites like [Google Maps](https://apify.com/drobnikj/crawler-google-places) require at least `4096MB` for optimal speed and [concurrency](https://crawlee.dev/api/core/class/AutoscaledPool#minConcurrency).
+- Projects involving large amount of data in memory.


This would be better to be discussed with tooling / delivery.

I think this is fine. The idea is that for a browser actor to start scaling concurrency, 1 GB is just not very useful but it can work as a minimum

valekjo · 2024-03-21T09:18:54Z

sources/platform/actors/running/usage_and_resources.md


 ### Maximum memory

-Apify Actors are most commonly written in [Node.js](https://nodejs.org/en/), which uses a [single process thread](https://betterprogramming.pub/is-node-js-really-single-threaded-7ea59bcc8d64). Unless you use external binaries such as the Chrome browser, Puppeteer, Playwright, or other multi-threaded libraries you will not gain more CPU power from assigning your Actor more than 4 GB of memory because Node.js cannot use more than 1 core.
+Apify Actors are most commonly written in [Node.js](https://nodejs.org/en/), which uses a [single process thread](https://betterprogramming.pub/is-node-js-really-single-threaded-7ea59bcc8d64). Unless you use external binaries such as the Chrome browser, Puppeteer, Playwright, or other multi-threaded libraries you will not gain more CPU power from assigning your Actor more than `4096MB` of memory because Node.js cannot use more than 1 core.


I'd change the link to somehting else - it's quite old and at the referenced site you have to create account to read it.

I'd again discuss with tooling / delivery which would be the nice articles to refer to. (I also mean the next one on multiple threads)

valekjo · 2024-03-21T09:20:03Z

sources/platform/actors/running/usage_and_resources.md

+When you run an Actor it generates platform usage that's charged to the user account. Platform usage comprises four main parts:

 - **Compute units**: CPU and memory resources consumed by the Actor.
- **Data transfer**: Amount of data you transfered between web, Apify platform, and other external systems.
+- **Data transfer**: The amount of data transferred between the web, Apify platform, and other external systems.
 - **Proxy costs**: Residential or SERP proxy usage.
- **Storage operations**: Read, write, and other operations towards key-value store, dataset, and request queue.
+- **Storage operations**: Read, write, and other operations performed on the Key-value store, Dataset, and Request queue.


I'd add some info where to find the run usage - it's visible on run detail and in run list.

changed link to site without need for account added new screenshots showcasing where user can find runs usage

B4nan · 2024-03-27T11:28:33Z

sources/platform/actors/running/usage_and_resources.md


 ### Maximum memory

-Apify Actors are most commonly written in [Node.js](https://nodejs.org/en/), which uses a [single process thread](https://betterprogramming.pub/is-node-js-really-single-threaded-7ea59bcc8d64). Unless you use external binaries such as the Chrome browser, Puppeteer, Playwright, or other multi-threaded libraries you will not gain more CPU power from assigning your Actor more than 4 GB of memory because Node.js cannot use more than 1 core.
+Apify Actors are most commonly written in [Node.js](https://nodejs.org/en/), which uses a [single process thread](https://dev.to/arealesramirez/is-node-js-single-threaded-or-multi-threaded-and-why-ab1). Unless you use external binaries such as the Chrome browser, Puppeteer, Playwright, or other multi-threaded libraries you will not gain more CPU power from assigning your Actor more than `4096MB` of memory because Node.js cannot use more than 1 core.


i believe its "single thread process", not "single process thread"

B4nan · 2024-03-27T11:33:54Z

sources/platform/actors/running/usage_and_resources.md


 ## Requirements

-Actors built on top of the [Apify JS SDK](/sdk/js) and [Crawlee](https://crawlee.dev/) use autoscaling. This means that they will always run as efficiently as they can with the memory they have allocated. So, if you allocate 2 times more memory, the run should be 2 times faster and consume the same amount of compute units (1 * 1 = 0.5 * 2). Autoscaling for Python is not yet available, but it is planned for the near future.
+Actors built with [Apify JS SDK](/sdk/js) and [Crawlee](https://crawlee.dev/) use autoscaling. This means that they will always run as efficiently as they can based on the allocated memory. So, if you double the allocated memory, the run should be twice as fast and consume the same amount of compute units (1 * 1 = 0.5 * 2).


it feels correct. 4g memory = 1cpu core, so the note about max memory of 4g for cheerio makes sense, the node process will only use a single thread there (sidenote: that's how we have it implemented, we could try to leverage worker threads or other similar parallelization features of node to get around this).

cc @metalwarrior665 for field experience :]

metalwarrior665 · 2024-03-28T17:41:36Z

sources/platform/actors/running/usage_and_resources.md


-> It is possible to [use multiple threads in Node.js-based Actor](https://dev.to/reevranj/multiple-threads-in-nodejs-how-and-what-s-new-b23) with some configuration. This can be useful if you need to offload a part of your workload.
+It's possible to [use multiple threads in Node.js-based Actor](https://dev.to/reevranj/multiple-threads-in-nodejs-how-and-what-s-new-b23) with some configuration. This can be useful if you need to offload a part of your workload.


We also have guide directly for Crawlee https://crawlee.dev/docs/3.7/guides/parallel-scraping

Later we can add horizontal scaling guide, especially since we have RequestQueueV2 now that supports multiple actors simultaneously.

TC-MO added 2 commits March 18, 2024 08:08

docs: update usage and resources

3ca671b

rewrites for clarity standarized references removed blockquotes added docusaurus style admonitions

Merge branch 'master' into update-usage-and-resources

a762092

github-actions bot assigned TC-MO Mar 18, 2024

github-actions bot added the t-docs Issues owned by technical writing team. label Mar 18, 2024

TC-MO requested a review from valekjo March 18, 2024 13:11

valekjo reviewed Mar 21, 2024

View reviewed changes

TC-MO added 4 commits March 21, 2024 14:06

fix typo & remove abbreviation from the heading

55cfbaf

add new screenshot & change link

29b6461

changed link to site without need for account added new screenshots showcasing where user can find runs usage

Merge branch 'master' into update-usage-and-resources

d8311f5

remove note about Actors readme

05eed3f

TC-MO requested a review from valekjo March 26, 2024 12:59

B4nan reviewed Mar 27, 2024

View reviewed changes

TC-MO added 3 commits March 28, 2024 14:25

remove mention of Apify JS SDK regarding autoscaling

e74aeab

fix word misplacement

ef23f9a

remove unnecessary notes & add new screenshot

2516d68

valekjo approved these changes Mar 28, 2024

View reviewed changes

TC-MO requested a review from metalwarrior665 March 28, 2024 16:18

Merge branch 'master' into update-usage-and-resources

a534b2a

metalwarrior665 approved these changes Mar 28, 2024

View reviewed changes

TC-MO merged commit 7c35d4d into master Mar 29, 2024
7 checks passed

TC-MO deleted the update-usage-and-resources branch March 29, 2024 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update usage and resources #886

docs: update usage and resources #886

TC-MO commented Mar 18, 2024

valekjo left a comment

valekjo Mar 21, 2024

TC-MO Mar 27, 2024

B4nan Mar 27, 2024

metalwarrior665 Mar 27, 2024

TC-MO Mar 27, 2024

metalwarrior665 Mar 27, 2024

valekjo Mar 21, 2024

metalwarrior665 Mar 27, 2024

valekjo Mar 21, 2024

valekjo Mar 21, 2024

B4nan Mar 27, 2024

B4nan Mar 27, 2024

metalwarrior665 Mar 28, 2024

metalwarrior665 Mar 28, 2024


		> It is possible to [use multiple threads in Node.js-based Actor](https://dev.to/reevranj/multiple-threads-in-nodejs-how-and-what-s-new-b23) with some configuration. This can be useful if you need to offload a part of your workload.
		It's possible to [use multiple threads in Node.js-based Actor](https://dev.to/reevranj/multiple-threads-in-nodejs-how-and-what-s-new-b23) with some configuration. This can be useful if you need to offload a part of your workload.

docs: update usage and resources #886

docs: update usage and resources #886

Conversation

TC-MO commented Mar 18, 2024

valekjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment